--- layout: page title: Data Science Master's Thesis htmlwidgets: TRUE permalink: /predictive-analytics-thesis/ ---

Welcome to my spot on the web for drafts, supplemental material, and general thoughts about doing a thesis project for the Master of Science in Predictive Analytics degree (now the Master's in Data Science (MSDS) program) from Northwestern University. Below the interactive plots, I'm developing a sort of "epilogue" containing thoughts about doing a data science Master's, choosing the thesis option, and some of the things I've learned along the way.

Thesis Paper

I'll update this section with drafts as they get finished.

2018-11-04: I have a (mostly) completed draft you can check out here on Google Drive. I'm currently awaiting comments from readers so no doubt it will change substantially. I haven't put a Table of Contents in but everything else is there (hooray!).

Code

All the code (mostly in R) for the thesis can be found in the project repo on GitHub.

Supplemental Material

Interactive Multidimensional Scaling Plots

Interactive multidimensional scaling plots of genetic profiles developed from open-source RNA-seq data available from the Aging, Dementia, and TBI Study from the Allen Brain Science Institute.

Use your mouse to grab them, rotate them, and zoom in and out. Hovering over a data point gives the point's coordinates in the first three MDS dimensions. Each point represents a genetic profile (based on expression levels for 50,000+ genes) for an individual patient/donor.

These were made using Plotly and htmlwidgets for R. Check out this blog post for more on multidimensional scaling of gene expression level data.

Shaded by Brain Region

HIP = hippocampus
FWM = forebrain white matter
PCx = parietal cortex
TCx = temporal cortex

plotly

Shaded by Donor Sex

plotly

Shaded by Limetime Number of Traumatic Brain Injuries (TBIs)

plotly

Shaded by Dementia Status

plotly

Differential Expression Analysis Filtering & p-Value Cutoff Experiments

A comparison of the numbers of "significant" genes obtained with different filtering parameters and p-value cutoffs for determining differentially expressed genes in donors with dementia.

Filtering & P-Value Cutoff Experiment Spreadsheet

Brain Region Intersection Gene Details

As a part of the exploratory analysis of the RNA-seq transcriptome data, I investigated the 29 genes that had altered expression patterns in all four brain regions sampled from donors with dementia (hippocampus, forebrain white matter, parietal cortex, or temporal cortex).

Brain Region Intersection Gene Details

Epilogue

Things I've Learned by Doing a Data Science Master's Thesis

Maybe you stumbled onto this page beacuse you're thinking of pursuing a data science Master's degree. Or maybe you're already in the MSDS program at Northwestern or somehwere else and are trying to make the "thesis or capstone" decision. In this section, I'll be keeping a list of some of the things I've learned from doing this degree with a focus on doing a thesis project. Just my $0.02. FWIW, etc. I'm putting it down here as a sort of epilogue to the thesis once she's all done.